## see ?STexampleData and browseVignettes('STexampleData') for documentation
## loading from cache
## class: SpatialExperiment
## dim: 33538 4992
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(7): barcode_id sample_id ... ground_truth cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
## [1] 33538 4992
## [1] "counts"
#counts table (object of class dgTMatrix which is a sparse matrix)
## 6 x 4 sparse Matrix of class "dgTMatrix"
## AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1
## ENSG00000243485 . . .
## ENSG00000237613 . . .
## ENSG00000186092 . . .
## ENSG00000238009 . . .
## ENSG00000239945 . . .
## ENSG00000239906 . . .
## AAACACCAATAACTGC-1
## ENSG00000243485 .
## ENSG00000237613 .
## ENSG00000186092 .
## ENSG00000238009 .
## ENSG00000239945 .
## ENSG00000239906 .
#genes with some level of expression
## 21 x 11 sparse Matrix of class "dgTMatrix"
## [[ suppressing 11 column names 'CGCGCAAGGAACTACA-1', 'CGCGCATGTTTGATTG-1', 'CGCGCCCGACTTAATA-1' ... ]]
##
## ENSG00000223764 . . . . . . . . . . .
## ENSG00000187634 . . . . . . . . . . .
## ENSG00000188976 . . 2 . . . . . . 1 1
## ENSG00000187961 . . . . . . . . . . .
## ENSG00000187583 . . . . . . . . . . .
## ENSG00000187642 . . . . . . . . . . .
## ENSG00000272512 . . . . . . . . . . .
## ENSG00000188290 1 . . . . . . . . 2 .
## ENSG00000187608 . 1 . . . . 2 . . 1 .
## ENSG00000224969 . . . . . . . . . . .
## ENSG00000188157 . 1 . . 2 . . . . 1 .
## ENSG00000273443 . . . . . . . . . . .
## ENSG00000237330 . . . . . . . . . . .
## ENSG00000131591 . . . . . . . . . 1 .
## ENSG00000223823 . . . . . . . . . . .
## ENSG00000272141 . . . . . . . . . . .
## ENSG00000205231 . . . . . . . . . . .
## ENSG00000162571 . . . . . . . . . . .
## ENSG00000186891 . . . 1 . . . . . . .
## ENSG00000186827 . . . . . . . . . . .
## ENSG00000078808 . 1 2 . 1 . . . . 1 .
## 21 x 11 sparse Matrix of class "dgTMatrix"
## [[ suppressing 11 column names 'CGCGCAAGGAACTACA-1', 'CGCGCATGTTTGATTG-1', 'CGCGCCCGACTTAATA-1' ... ]]
##
## ENSG00000160294 . . . . . . . . . . .
## ENSG00000228137 . . . . . . . . . . .
## ENSG00000239415 . . . . . . . . . . .
## ENSG00000182362 . . . . . . . . 1 . .
## ENSG00000160298 . . . . . . . . . . .
## ENSG00000160299 . . 1 . 1 . . . . . .
## ENSG00000160305 . . . . . 2 . . . . .
## ENSG00000160307 1 3 1 1 4 5 1 1 . 2 1
## ENSG00000160310 . . . . 1 . . . . 2 .
## ENSG00000198888 17 44 71 16 154 97 12 14 32 167 6
## ENSG00000198763 16 59 64 11 116 63 11 12 18 123 6
## ENSG00000198804 37 85 155 25 252 176 24 27 38 335 12
## ENSG00000198712 23 79 120 23 214 170 22 25 48 242 10
## ENSG00000228253 2 . 3 . 1 . . 1 1 6 .
## ENSG00000198899 20 39 93 9 136 108 20 18 25 165 7
## ENSG00000198938 27 59 133 20 216 120 22 26 43 232 9
## ENSG00000198840 5 27 33 5 71 39 8 11 12 78 .
## ENSG00000212907 2 . 4 2 7 5 . 1 1 9 .
## ENSG00000198886 15 65 95 9 183 98 18 19 33 178 7
## ENSG00000198786 2 10 10 3 20 14 1 2 2 25 4
## ENSG00000198695 1 1 3 . 2 2 . . . 1 .
## DataFrame with 6 rows and 3 columns
## gene_id gene_name feature_type
## <character> <character> <character>
## ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
## ENSG00000237613 ENSG00000237613 FAM138A Gene Expression
## ENSG00000186092 ENSG00000186092 OR4F5 Gene Expression
## ENSG00000238009 ENSG00000238009 AL627309.1 Gene Expression
## ENSG00000239945 ENSG00000239945 AL627309.3 Gene Expression
## ENSG00000239906 ENSG00000239906 AL627309.2 Gene Expression
## pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1 3913 2435
## AAACAAGTATCTCCCA-1 9791 8468
## AAACAATCTACTAGCA-1 5769 2807
## AAACACCAATAACTGC-1 4068 9505
## AAACAGAGCGACTCCT-1 9271 4151
## AAACAGCTTTCAGAAG-1 3393 7583
## DataFrame with 6 rows and 7 columns
## barcode_id sample_id in_tissue array_row
## <character> <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673 0 0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673 1 50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673 1 3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673 1 59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673 1 14
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673 1 43
## array_col ground_truth cell_count
## <integer> <character> <integer>
## AAACAACGAATAGTTC-1 16 NA NA
## AAACAAGTATCTCCCA-1 102 Layer3 6
## AAACAATCTACTAGCA-1 43 Layer1 16
## AAACACCAATAACTGC-1 19 WM 5
## AAACAGAGCGACTCCT-1 94 Layer3 2
## AAACAGCTTTCAGAAG-1 9 Layer5 4
## DataFrame with 2 rows and 4 columns
## sample_id image_id data scaleFactor
## <character> <character> <list> <numeric>
## 1 sample_151673 lowres #### 0.0450045
## 2 sample_151673 hires #### 0.1500150
#The position of a point in an image does not map directly to the spot location in cartesian coordinates, as it is the top-left of an image that is (0,0), not the bottom-left. In order to manage this, we need to transform the y-axis coordinates.
## [1] 600 600
#To identify spot is “on tissue” or not can be used to colour the
spots
## [1] 33538 4992
## [1] 33538 3639
#quality “trim” the dataset is to calculate the percentage per spot of mitochodrial gene expression and store this information inside the colData #identify the mitochrondrial genes - their gene names start with “MT-” or “mt-”
## [1] "MT-ND1" "MT-ND2" "MT-CO1" "MT-CO2" "MT-ATP8" "MT-ATP6" "MT-CO3"
## [8] "MT-ND3" "MT-ND4L" "MT-ND4" "MT-ND5" "MT-ND6" "MT-CYB"
#proportion of reads in a spot’s library
## DataFrame with 6 rows and 13 columns
## barcode_id sample_id in_tissue array_row
## <character> <character> <integer> <integer>
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673 1 50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673 1 3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673 1 59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673 1 14
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673 1 43
## AAACAGGGTCTATATT-1 AAACAGGGTCTATATT-1 sample_151673 1 47
## array_col ground_truth cell_count sum detected
## <integer> <character> <integer> <numeric> <numeric>
## AAACAAGTATCTCCCA-1 102 Layer3 6 8458 3586
## AAACAATCTACTAGCA-1 43 Layer1 16 1667 1150
## AAACACCAATAACTGC-1 19 WM 5 3769 1960
## AAACAGAGCGACTCCT-1 94 Layer3 2 5433 2424
## AAACAGCTTTCAGAAG-1 9 Layer5 4 4278 2264
## AAACAGGGTCTATATT-1 13 Layer6 6 4004 2178
## subsets_mito_sum subsets_mito_detected subsets_mito_percent
## <numeric> <numeric> <numeric>
## AAACAAGTATCTCCCA-1 1407 13 16.6351
## AAACAATCTACTAGCA-1 204 11 12.2376
## AAACACCAATAACTGC-1 430 13 11.4089
## AAACAGAGCGACTCCT-1 1316 13 24.2223
## AAACAGCTTTCAGAAG-1 651 12 15.2174
## AAACAGGGTCTATATT-1 621 13 15.5095
## total
## <numeric>
## AAACAAGTATCTCCCA-1 8458
## AAACAATCTACTAGCA-1 1667
## AAACACCAATAACTGC-1 3769
## AAACAGAGCGACTCCT-1 5433
## AAACAGCTTTCAGAAG-1 4278
## AAACAGGGTCTATATT-1 4004
#current plot the library sizes looks good and evenly distributed.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#The horizontal red line (argument threshold in the plotQC function) shows a first guess at a possible filtering threshold for library size based on the above histogram.
## `geom_smooth()` using formula = 'y ~ x'
## `stat_xsidebin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_ysidebin()` using `bins = 30`. Pick better value with `binwidth`.
#alternative plot using ggplot
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
# It is important to look at the number of spots that are left out of
the dataset by this choice of cut-off value #look at their putative
spatial patterns #filtered out spots with biological relevance, then we
should observe some patterns on the tissue map that correlate with some
of the known biological structures of the tissue. If we do observe such
a phenomenon, we have probably set our threshold too high (i.e. not
permissive enough).
## qc_lib_size
## FALSE TRUE
## 3628 11
#plot a histogram of the number of expressed genes across spots #summary: A gene is “expressed” in a spot if it has at least one count in it
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# plot number of expressed genes vs. number of cells per spot
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
#we apply the chosen threshold to flag spots with (in this case) fewer than 500 expressed genes. ### Select expressed genes threshold
## qc_detected
## FALSE TRUE
## 3628 11
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# plot mitochondrial read proportion vs. number of cells per spot
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.